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SUMMARY User selection (US) with Zero-forcing beamforming is 
considered in fast fading Gaussian vector broadcast channels with perfect 
channel state information (CSI) at the transmitter. A novel criterion for 
US is proposed, which depends on both CSI and the data symbols, while 
conventional criteria only depend on CSI. Since the optimization of US 
based on the proposed criterion is infeasible, a greedy algorithm of data- 
dependent US is proposed to perform the optimization approximately. An 
overhead issue arises in fast fading channels: On every update of US, the 
transmitter might inform each user whether he/she has been selected, using 
a certain fraction of resources. This overhead results in a significant rate 
loss for fast fading channels. In order to circumvent this overhead issue, it- 
erative detection and decoding schemes are proposed on the basis of belief 
propagation. The proposed iterative schemes require no information about 
whether each user has been selected. The proposed US scheme is compared 
to a data-independent US scheme. The complexity of the two schemes 
is comparable to each other for fast fading channels. Numerical simula- 
tions show that the proposed scheme can outperform the data-independent 
scheme for fast fading channels in terms of energy efficiency, bit error rate, 
and achievable sum rate. 

key words: vector broadcast channels, data-dependent user selection, 
zero-farcing beamforming, fast fading channels, iterative decoding. 

1. Introduction 

Multiple-input multiple-output broadcast channels (MIMO- 
BCs) are a mathematical model of downlink channels in 
which a base station with multiple transmit antennas com- 
municates with multiple receivers (users). In this paper, they 
are referred to as vector broadcast channels (VBCs) since 
the number of receive antennas is assumed to be one. 

The capacity region of MIMO-BCs has been shown to 
be achieved by dirty -paper coding (DPC) Q]-@]. DPC is 
a sophisticated precoding scheme that pre-cancels multiple- 
access interference (MAI) to each user at the transmitter, by 
utilizing the information about the data symbols transmitted 
to the other users. However, DPC is infeasible because of 
the high complexity. Thus, a recent research issue is to con- 
struct a precoding scheme that can achieve an appropriate 
tradeoff between the complexity and the performance. 

In order to achieve a good tradeoff between the corn- 
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plexity and the performance, user selection (US) with zero- 
forcing (ZF) beamforming (ZFBF) has been considered |6- 
HD. US is based on a different idea from that for DPC: US 
aims to keep the MAI power as small as possible by se- 
lecting a subset of channel vectors with higher orthogonal- 
ity. On the other hand, DPC pre-cancels (possibly large) 
MAI by utilizing the information about the data symbols 
as well as channel state information (CSI). Yoo and Gold- 
smith [ 8 ] proved that a greedy algorithm of US with ZFBF 
can achieve the sum capacity as the number of users tends 
to infinity, even though it utilizes no information about the 
data symbols. Intuitively, this result can be understood as 
follows: The algorithm attempts to select a subset of chan- 
nel vectors with higher orthogonality. When the number 
of users tends to infinity, it is possible to select a subset of 
users whose channel vectors are almost orthogonal. Conse- 
quently, the algorithm can achieve the sum capacity in that 
limit, while it is suboptimal for a finite number of users. 

A crucial assumption for US is the assumption of 
quasi-static or show fading channels. This assumption be- 
comes unrealistic as mobility of users increases. Thus, it 
is important in practice to investigate fast fading channels. 
Note that the meaning of fast or slow is relative. In this 
paper, fading is said to be fast when the coherence time is 
much shorter than the code length, determined by delay con- 
straints 0. 

The purpose of this paper is to construct a novel US- 
based communication scheme that is suitable for fast fading 
channels. An overhead issue arises in fast fading channels: 
US should be updated frequently for fast fading channels to 
keep track of the fading channels. On every update of US, 
the base station might inform each user whether he/she has 
been selected, using a certain fraction of resources. This 
overhead is negligibly small for quasi-static or slow fading 
channels, since the frequency of updates is low. However, 
the frequency of updates grows as the coherence time of fad- 
ing channels reduces. Consequently, the overhead results in 
a large rate loss for fast fading channels. In order to cir- 
cumvent this overhead issue, we propose a communication 
scheme that allows each user to detect whether he/she has 
been selected with no overhead. 

It is possible to attain an additional gain in performance 
for fast fading channels. We propose a data- dependent 
criterion of US that combines the ideas of US and DPC, 
while the existing criteria of conventional US are data- 
independent ||6r[8]]. A greedy algorithm of data-dependent 
US based on the proposed criterion is systematically derived 
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to select a subset of channel vectors with high orthogonal- 
ity and to pre-cancel MAI by using the information about 
the data symbols. A frequent update of US or a small block 
size of US results in decreasing the number of interfering 
signals that should be pre-cancelled simultaneously. Thus, 
MAI can be pre-cancelled well if the block size of US is 
small, or if US is updated frequently. In other words, the 
data-dependent US that pre-cancels MAI is suitable for fast 
fading channels. 

The rest of this paper is organized as follows: After 
summarizing the notation used in this paper, a VBC is in- 
troduced in Section [2] as a mathematical model of downlink 
channels. In Section [3] a novel criterion of US is proposed 
on the basis of a lower bound of the achievable sum rate for 
the fast fading VBC. Furthermore, we present a systemati- 
cal derivation for a greedy algorithm of data-dependent US 
based on the proposed criterion. In Section |4] we propose 
iterative receivers that allow each user to detect whether 
he/she has been selected. Numerical simulations presented 
in Section [5] show that the data-dependent scheme can out- 
perform data-independent US for fast fading channels. Sec- 
tion|6]concludes this paper. 

1 . 1 Notation 

Throughout this paper, 3. denotes a row vector, while a rep- 
resents a column vector. For a matrix A, A T and A H stand 
for the transpose and the conjugate transpose of A, respec- 
tively. For a full-rank matrix A e C KxN , with K < N, A 1 
denotes a pseudo-inverse of A, given by 

A ] '^ A^iAA^Y 1 . (1) 

The matrix represents the A^-dimensional identity matrix. 
A circularly symmetric complex Gaussian distribution with 
variance a 2 is denoted by CN(0, o 2 ). For functions fix) 
and g(x), f(x) oc g(x) means that fix) is proportional to 
gix), i.e., there is such a constant C that fix) - Cgix). 

2. Channel Model 

We consider a Zf-user Gaussian VBC in which the base sta- 
tion has N transmit antennas. The base station communi- 
cates with the users over T time slots. The received signal 
ijk,t e C of user k with one receive antenna in time slot t 
it = 0, 1, . . . , T — 1) is given by 

ykj = A=itk,tU t + nk,t, foxk=\,...,K, (2) 

with 

1 

(=0 

In ©, u, e and n u ~ CN(0,N ) denote the trans- 
mitted vector in time slot t and the additive white Gaus- 
sian noise (AWGN) with variance No for user k in time 
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Fig. 1 Transmitter. 



slot t, respectively. The row vector hk,t £ C 1 ^ represents 
the channel gains between the transmitter and user k with 

E[/i" f 4, f ] = N~ X I N . The assumption E[|(4,f)«l 2 ] = 1 /N 
normalizes the power gain obtained by increasing the num- 
ber of transmit antennas. The coefficient 1 / V£ in (0 im- 
plies that the average transmit power is restricted to 1. 

The over-loaded case K > N is considered in this pa- 
per. The channel vectors {n^t ■ for all k] for different users 
are assumed to be mutually independent. For simplicity, we 
assume perfect CSI at the transmitter, i.e., that all channel 
vectors \hk,t) are known to the transmitter. Note that the lat- 
ter assumption is an idealized assumption for time-division 
duplex (TDD) systems. The influence of channel estimation 
errors will be briefly noted in Section [4] For further simpli- 
fications, phase shift keying (PSK) is assumed, and power 
allocation is not considered in this paper. 

3. Transmitter 

3.1 Overview 

FigureQ]shows a diagram of the proposed transmitter. A bi- 
nary information sequence [s^j e {0, l})^ of length L for 
user k is first encoded by a per-user encoder with rate r. In 
order to combat burst errors, bit-interleaved coded modula- 
tion (BICM) with a PSK constellation M c C is applied 
to the coded sequence. The obtained sequence {xk,t 6 M : 
t = 0, . . .,T - 1}, with T = L/ir log 2 \M\), is fed to a ZF 
beamformer with US, proposed in Section [3~3l Since power 
allocation is not considered, E[|x£ ,| 2 ] = 1 is assumed for 
all users. Let K (< min{K,N}) and B denote the number of 
selected users and the block size of US, respectively. US is 
updated for every B time slots, i.e., K users are selected and 
fixed during B time slots. Let 7C, c {1, . . . , K] denote the set 
of users selected in time slot t. Note that {7C,} are the same 
for time slots belonging to the identical block of US. The 
vector u, transmitted in time slot t is given byM, — u% u , |10|, 
with 

u%,, = H~ l %l x % ,,- (4) 

In ©, the matrix H %J e C KxN and the vector x% lt e C* 
are generated by stacking the channel vectors e C lxN : 
k €%} and the data symbols {x^ t ■ k e'K,} for the users 7C, 
selected in time slot t, respectively. They must be stacked in 
the same order, otherwise the data symbols would be sent to 
unintended users. The data symbols \x^ t : k £ 'K,} for the 
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non-selected users in time slot t are discarded at the trans- 
mitter side. They are recovered at the receiver by utilizing 
redundancy of the error-correcting code. The computational 
complexity required in the receiver can be reduced by dis- 
carding the data symbols for the non-selected users. The 
details will be remarked in the next section. 

Fast fading channels are considered in this paper, as 
mentioned in Section The code length Lj r or the length 
of interleaving T = L/(rlog 2 \M\) is assumed to be much 
longer than the coherence time of the fading channels. Note 
that the dominant factor of delay is not US but the error- 
correcting, since the block size B of US is comparable to the 
coherence time. 

3.2 Criterion for US 

Sum rate and fairness should be taken into account as cri- 
teria for selecting users. For simplicity, however, we only 
consider a criterion based on the achievable sum rate, and 
propose a novel criterion for the fast fading VBC (0. 

T jB updates of US are performed during T time slots, 
since the block size of US is B. We focus on block j of 
US for j = 0, 1, ... , T/B - 1. Applying the ZFBF © to 
the VBC (0 implies that, if user k has been selected in 
block j, he/she receives the sum of the normalized data sym- 
bol S~ l l 2 xitj and the AWGN n^.t in the corresponding time 
slots t — jB, . . . , jB + B - 1. Otherwise, user k receives 
the sum of the AWGN n^, and the interference S^^hkU^ ,, 
with (0, the later of which is caused by the ZFBF intended 
for the selected users. These observations imply that the 
equivalent channel for user k in block j is given by 



U) x k , t + (l-a ( k j) )lJ + n k j, 



(5) 



for t = jB, . . . , jB + B - 1. In (0, 4 , = «£«</<-,,/ denotes the 
interference to user k, with (0. Furthermore, af e {0, 1} 
indicates whether user k has been selected in block j, i.e. 



„0) 



1 when user k is selected in the j'th block 
otherwise. 



(6) 



Note that a J is unknown to the receiver in advance. 

Remark 1: Let us discuss why the data symbols for non- 
selected users should be discarded at the transmitter side. 
The received signal in time slot t contains only the data 
symbol Xk,t in the same time slot, because the transmitter 
has discarded the data symbols for non-selected users. What 
would occur if the transmitter kept the data symbols for the 
non-selected users? The received signal in time slot t might 
not contain the data symbol in the same time slot. Thus, 
each user would have to detect the index of the data symbol 
sent in time slot t. A simple method is to count how many 
times he/she has been selected. This method cannot yield 
the correct index unless all decisions of a k in the preceding 
blocks are correct. Consequently, serious error propagation 



would occur once (0 is detected incorrectly. This argument 
implies that a complicated receiver would be required for 
detecting (0 if the data symbols were not discarded. This 
is the reason why the data symbols for non-selected users 
should be discarded at the transmitter side. 

We shall assess the achievable sum rate for user k. Let 
us assume that (0 can be detected with no errors. This as- 
sumption can be a reasonable assumption even for small B, 
as demonstrated numerically in Section [5] In this case, the 
equivalent channel (0 can be regarded as a Gaussian erasure 
channel, in which each erasure probability Prob(a^' = 0) 
may depend on the data symbol x^ t for user k via the set of 
selected users 'K,. We ignore this dependencies in this pa- 
per. The achievable rate under this assumption should pro- 
vide a lower bound on the true one, since the receiver can 
obtain information about the data symbols from the obser- 
vations of {a^}. In order to evaluate the achievable rate, we 
need the average frequency at which each user is selected. 
The channel gains for each user become large or small block 
by block. This fading effect is averaged out for sufficiently 
large T, because of the assumption of fast fading. Conse- 
quently, the users should experience the identical channel 
quality in average, so that each user should be selected at a 
frequency of K/K as T — > oo. Since the effective signal-to- 
noise ratio (SNR) for each selected user is equal to (&Nc»y l , 
from (0, a lower bound on the achievable rate of user k 
for transmission over T time slots is given by IfTTII 



K \SN 



as T — > oo, with 



1 

& = lim — > \\u t 

f=0 



(7) 



(8) 



In (0, C(y) denotes the achievable rate of the AWGN chan- 
nel with the signal-to-noise ratio (SNR) y, defined as the 
mutual information between the data symbol and the re- 



ceived signal /(jc^,; yk,t\af = 1) Ifl2l . Se e l Appendix A for 



the formal derivation of (0. Equation (0 implies that a 
lower bound R on the achievable sum rate for the fast fading 
VBC (0 is given by 



R 



Y R k = Kc(J—\. 
U l SAW 



(9) 



Remark 2: In the derivation of (0, we have implicitly as- 
sumed that data-dependent US does not change the distri- 
bution of the data symbol jq ,. This assumption is valid for 
PSK data symbols considered in this paper. However, the 
assumption does not hold for multi-level modulation, since 
the transmitter can reduce (0 by selecting users who trans- 
mit the data symbols with small amplitudes. 

Maximizing the achievable sum rate (0 for given K 
and B is equivalent to minimizing (0, since the achievable 
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rate C(y) is a monotonically increasing function of y. This 
conclusion is due to the assumption of equal powers for all 
users, i.e. Eflx^l 2 ] = 1. If power allocation were used, 
maximizing the achievable sum rate might not be equivalent 
to minimizing ([8]). The average power ([8]l of the transmit- 
ted vector should not be confused with the average transmit 
power, which is restricted to 1 owing to the coefficient 1 / Vfi 
in (f2j). The average power (O should be regarded as a cost 
for performing ZFBF ©. Thus, we hereafter refer to ([8]l as 
energy penalty. 

We first minimize the energy penalty for given K and B. 
The number of selected users K is chosen so as to maximize 
the achievable sum rate ((9]). On the other hand, the block 
size of US B should be selected carefully on the basis of the 
energy penalty and the detection performance for ©. The 
details will be discussed in Section 15.31 The minimum of 
([8]l for fixed K and B, denoted by <5 m i n , is achieved when the 
time average of {||h,|| 2 } in each block is minimized: 



£ min = lim - V min E K J!{B) 

T^oc T ^ Kc{l,.„,K) A 
j=0 



T/B-l 



7 0h 



=E 



min E$(E) 



<Kc{\,...,K} 



(10) 



3.3 Data-Dependent User Selection 

The minimization of the energy penalty ( TTTb is infeasible, 
because of high complexity, as conventional criteria are. In- 
stead, we propose a greedy algorithm to calculate the min- 
imization approximately. Without loss of generality, we 
hereafter focus on the first B time slots, and drop the su- 
perscript A small value of the block size B of US is 
used, so that one can postulate that the channels are fixed 
during one block of US, i.e. hk,i — • ■ • = hj^s = hh F° r 
notational convenience, the set of users selected in time 
slots t = Q,...,B- 1 is denoted by ft. Furthermore, the 
matrix H-x, t is re- written as H%. 

The derivation of the proposed algorithm is summa- 
We first present several definitions 
In the proposed greedy algorithm 
, K) denote 
i. 

The ZFBF vector UftOV e C N for the users ft(f) selected 
in the first i stages is given by u<K(i),t - x<K(r>,t ■ Fur- 
thermore, Pf denotes the projection matrix from C lxN onto 
the orthogonal complement of the subspace spanned by the 
channel vectors {hk : k e ft(i)} selected in the first i stages. 



rized in Appendix B 



used in the algorithm, 
users are selected one by one. Let ft(i) c {1, 
the set of users selected in the first i stages, with \ft(f)\ 



with the number of selected users fixed \ft\ = K. In The two matrices ff^ (j) an( j pf are calculated recursively as 



E~(B) denotes the energy penalty for block j, 



follows: 



1 B ^ 

E%(B) = - J] \\u<K,, +j B\\\ 
/=() 



(11) 



where u-x yt is defined as 0J. 

In conventional US, the energy penalty ( fTTT ) may be 
minimized after taking the limit B — > oo, in which ( fTTT ) con- 
verges in probability to the conditional expectation with re- 
spect to the data symbols, 



H' = 



In- 



Pt,ht. 



H 



PL, hi. 



X(t-iy 



h V -h. 



(15) 



(16) 



8% =E[\\uK, t+jB \\ 2 \{il k , t }]. 



(12) 



The minimizer of the energy penalty ( fTTT i depends on both 
channel vectors and data symbols, while the conventional 
criterion never depends on the realizations of data symbols. 
Note that the minimization and the limit B — > oo are not 
necessarily commutative. It is straightforward to find that 
the energy penalty based on data-dependent US is smaller 
than the conventional one: Let ft^n denote the minimizer of 
the energy penalty dT2b averaged over the data symbols. In 
both sides of the inequality 



where k denotes the user selected in stage i. 

The proposed greedy algorithm selects the user k to 
minimize (fTTT) with ft = ft(i) in stage i, which is recur- 
sively given by 

B ' 2.-1 \Xt t — htU-KH-l) 

- ^ ; — ; + £"X(i-i)(ti). 



IIMtil 



(17) 



min E. 

•K 



U) 



, K (B)<E%JB), 



(13) 



we take the limit B — > oo. Since ft con is independent of the 
data symbols, we can use the weak law of large numbers to 
find that the right-hand side tends to the minimum of ( fT2l . 



i.e. fi^r . This implies 



lim min £^(5) < min lim E]i'(B). 



U), 



B->co <K 



■K 1 



(14) 



The proposed algorithm is summarized as follows: 

Step 1 i = 1, ft(0) = 0, P£ = l N , h 0; , = 0, and E 9 = 0. 
Step 2 Let ft(i) = ft(i - 1) U {k}, where the user k e 

{1, . . .,K}\ft(i - 1) minimizes (jTT). 
Step 3 If i = K, outputs ft = ft(i). Otherwise, compute 

( TT3T > and dT6b . and go back to Step 2 after i := i + 1. 

Expression (fTTT i provides a useful interpretation with 
respect to the proposed algorithm. In order to select the 
user minimizing (jTTJ, one should select such a user that the 
denominator of the first term in the right-hand side of ( fTTT i 
is large, or that the numerator is small. Existing US algo- 
rithms have been proposed on the basis of maximizing the 
denominator ||«< : i , ;'i 1 || 2 |6) or of its modifications UJO. Se- 
lecting the user to maximize the denominator is equivalent 
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to selecting a user that achieves high orthogonality between 
his/her channel vector and the channel vectors selected in 
the preceding stages. The point of the proposed algorithm 
is that the numerator is also taken into account, along with 
the denominator. The numerator becomes small when the 
amplitude and phase of the interference hkUyca-ixt are close 
to those of the transmitted symbol x kJ . The proposed algo- 
rithm selects the user attaining an appropriate tradeoff be- 
tween two criteria, i.e., between the maximization of the de- 
nominator and the minimization of the numerator. 

Remark 3: It is possible to derive a greedy algorithm of 
US based on the data-independent criterion ( TTSl i. instead of 
the data-dependent criterion ( fTTT i. The obtained algorithm is 
equivalent to the restriction of a greedy algorithm proposed 
in ||7l to the case of no power allocation. We hereafter refer 
to this greedy algorithm as data-independent US. 

Let us evaluate the computational complexity of the 
proposed algorithm. In Step 2 the computational costs for 
calculating <j4j with 'K, = TCii - 1) and the numerator in 
dTTb are O(iBN) and O(BN) in stage i, respectively. Fur- 
thermore, the complexity for evaluating the denominator in 
( TTTb is given by 0(N 2 ). Thus, the complexity required in 
Step 2 is 0{iBN + K(BN + N 2 )} in stage i. Similarly, we find 
that the complexity needed in Step 3 is 0(iN 2 ) in stage i. 
Thus, the complexity of the proposed algorithm is given by 
0{KK max(B, AON}, because of K < mm(K, N). 

Decreasing the block size B results in increasing 
the frequency of the US. Thus, we focus on the com- 
putational complexity per time slot, which is given by 
0{KK max(l, N/B)N) for the proposed algorithm. On the 
other hand, the complexity of the conventional US schemes 
proposed in (6HH], including the data-independent scheme 
in Remark [3] is 0(KKN 2 ). Thus, the complexity per time 
slot is equal to 0{KK(N/T c )N), with T c denoting the co- 
herence time of fading channels, which is equal to the 
block size of the conventional US schemes. Imposing the 
constraint B < T c implies N/T c < max(l, N/B), where 
the equality holds only when N = B = T c , because of 
N/T c < N/B < max(l, N/B). Thus, the complexity per 
time slot of the proposed scheme is the same as that for the 
conventional US schemes when N = B = T c . This result 
implies that the proposed algorithm is efficient in terms of 
the complexity for small T c , since it is not easy in practice 
to use many transmit antennas. 

4. Iterative Receivers 

4. 1 Belief Propagation 

The goal of the receiver is to perform the (bit-wise) max- 
imum a posteriori (MAP) decoding of the information se- 
quence {s k j} given the received signals \y k . t } in all time slots. 
However, it is infeasible to perform the MAP decoding ex- 
actly in terms of the computational complexity. Instead, 
we derive suboptimal iterative decoders based on message- 
passing between a demodulator and a soft-input soft-output 



SISO 
decoder 



deinterleaver k 



{G„ + ife,,)l 



interleaver k 



demodulator 



Fig. 2 Iterative detection and decoding. 

(SISO) decoder, using belief propagation (BP) JT3][14l (See 
Fig. 0. BP is a general algorithm for calculating marginal 
posterior probabilities for graphical models. If there are no 
cycles in the factor graph representing a graphical model, 
BP can calculate the marginal posterior probabilities ex- 
actly. BP may converge and provide a good approximation 
of the marginal posterior probabilities for a certain sparse 
factor graph, even if there are cycles in the factor graph. No- 
table examples are turbo codes ifTsHTTl . low-density parity- 
check (LDPC) codes [ 18 1, multiuser decoding [ 19-21 1, and 
iterative channel estimation and decoding [22-24]. We be- 
lieve that it is possible to show that BP-based iterative al- 
gorithms converge if the length of interleaving in BICM is 
sufficiently longer than the coherence time of the channels, 
by applying an argument in ll25l . 

4.2 Soft-Decision Demodulator 

Existing BP-based SISO decoders can be used for calculat- 
ing the messages from the SISO decoder to the demodulator 
in Fig. [2] Thus, we only present the derivation of demod- 
ulators. The detection of © is performed block by block. 
Without loss of generality, we focus on the first block of 
US, and drop the superscript from or. Let P m (xk,t) denote 
the message with respect to x kJ sent by the decoder in iter- 
ation m. By the definition of BP [14, Chapter 2], the mes- 
sage Q m +i(x kJ ) with respect to x kJ fed back to the decoder 
is given by 

i B-i 

Qm+\{Xk,i) K ^ p{ak)p{yk,t\Xk,i,a k ) Y\ P(Hk,t'\ a k), 



fli=0 



f=0,f*t 



with 



P(Uk,f\ak) = ^ p(yk,t'\xk,t',a k )P m (x kJ ,). 



(18) 



(19) 



In d 1 8t >. p(a k ) denotes a prior probability of ©. Fur- 
thermore, the conditional probability density function (pdf) 
p(yk,r\xk,t',ak) represents the equivalent channel ©. In or- 
der to obtain an interpretable expression of (II St . we define 
the posterior probability of a k given {y k f : t' = 0, . . . , B - 
l,t' + t\ as 



, „ Ja.*W P( a k)I\t>i:tP(yk,Aak) 

p (.{yk,t> ■ t * t}) 



(20) 



with 
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p({yk,f -f *t}) = 2] P {a k) f~[ P(yk,r\a k ). (21) 4 3 Hard-Decision Demodulator 



fli=0 f±t 

Dividing the right-hand side of ( fT8l ) by the constant p({y k j> '■ 
f + ?}) yields 

l 

Qm+\(xk,t) « /It P(yk,t\xk,t,a k )p(a k \{y kf : f * ?}). (22) 

flt =o 

Since a k is a binary variable, the posterior probabil- 
ity ( l20b is characterized by the posterior mean a k = 

l} ai =O a kP( a k\{yk,t> ■ f + ?}). 

In order to evaluate the conditional pdf p{y k j\x k j, a k ) 
we need the distribution of the interference I kyt in ((5). How- 
ever, it is difficult to access its exact distribution. We use a 
Gaussian approximation instead: We approximate the dis- 
tribution of I kJ by a circularly symmetric complex Gaussian 
distribution with variance cr 2 = E[|4 jf | 2 ]. This approxima- 
tion simplifies the conditional pdf p(y k j\x k j, a k ), 



P(yk,t\x k4 , a k ) 

a k i«-,,---t,,/Vsi 2 

s e N " 

ttNq 



1 -a k 



tt(N () + o- 2 /6) 



(23) 



In calculating the message d22l . we need the prior prob- 
ability p(a k ), the noise variance A^o, the energy penalty, and 
the average power of the interference cr 2 . For simplicity, we 
assume that the true values of these parameters are known 
in advance. In all numerical simulations, the true values are 
used. Note that it is straightforward to estimate these param- 
eters in a decision-direct manner. 

In summary, the message Q,„+i(x k j) is updated as fol- 
lows: The posterior probability (1201) . or equivalently the pos- 
terior mean a k , is first calculated from the prior probability 
p(a k ), ( fT9l ), ( 1231 . and the messages \P m {x k f)}. Next, the 
marginalization of the conditional pdf d23l over a k is calcu- 
lated to obtain the message d22l . We refer to this demodula- 
tor as "soft-decision demodulator." 

Remark 4: We have assumed that perfect CSI is available 
at the transmitter. This assumption is an idealized assump- 
tion for TDD systems, in which the channel vectors are 
estimated on the basis of pilot signals transmitted through 
the reciprocal channel. If the channel estimation were im- 
perfect, the equivalent channel (0 would include an addi- 
tional interfering signal due to the channel estimation er- 
rors. There should not be much difference between the 
powers of the interfering signals for the data-dependent and 
data-independent schemes, if the channel estimation errors 
are independent of a subset of selected users K. In other 
words, the interfering signals for both schemes should pro- 
vide almost the same influence on the performance of the 
receiver. This argument allows us to assume perfect CSI at 
the transmitter, as long as the comparison between the data- 
dependent and data-independent schemes is concerned. 



In order to simplify the calculation of the message d22l , we 
consider the hard decision of a k based on the MAP criterion 



.(MAP) _ aTgmaXp l ak \{y kl , ■ f' + f }) 

a 4 ={0,l} 

The message (1221 is approximately calculated as 

.-.(MAP), s , | ~(MAPk 

Q,n+\ ( x k,t> K P(yk,t\Xk,t, a k = a k ). 



(24) 



(25) 



We refer to this demodulator as "hard-decision demodula- 
tor." The MAP detection of a k is equivalent to the maximum 
likelihood (ML) detection of a k for K/K - 1 /2. Note that 
the message d25b with respect to the data symbol x kJ is sent 
to the SISO decoder as soft information for both demodula- 
tors. 

5. Numerical Simulations 

5 . 1 Energy Efficiency 

The performance of the data-dependent US is numerically 
compared to that of the data-independent US, which is a 
greedy algorithm of US based on the criterion 111 2b . instead 
of ( fTTb . As noted in Remark [3] the data-independent US is 
a special case of greedy US proposed in |7|. In all numer- 
ical simulations presented in this paper, quadrature phase 
shift keying (QPSK) is used. Furthermore, we assume inde- 
pendent and identically distributed (i.i.d.) Rayleigh block- 
fading channels with coherence time T c , i.e. the channel vec- 
tors {n k ,,} do not change during T c time slots, and at the 
beginning of the next fading block they are independently 
sampled from a circularly symmetric complex Gaussian dis- 
tribution with covariance matrix A more practical as- 
sumption might be the assumption of block-fading with cor- 
relations between the adjacent blocks. However, the corre- 
lations provide no influence on the energy penalty ([8]l if B is 
smaller than the coherence time T c , while they shorten the 
length of interleaving effectively. 

We first focus on the performance of the data- 
dependent US algorithm in terms of the energy penalty ([8} 
as T — > oo. Figure [3] shows the average energy penalty 
per selected user with respect to K/N. For comparison, 
the optimal US based on the data-dependent criterion ( TTTb 
and the (suboptimal) data-independent US are also plotted. 
The QPSK inputs were independently sampled with equal 
probability. This assumption is justified for proper error- 
correcting codes in conjunction with BICM. We find that 
the greedy algorithm of the data-dependent US outperforms 
the data-independent scheme, and that it can achieve nearly 
optimal energy penalty for small-to-moderate K/N. 

Figure [4] shows the average energy penalty versus 
the block size B of US. The energy penalty for the data- 
dependent scheme increases slowly toward that for the data- 
independent scheme, as the block size B grows. This ob- 
servation is because it is unlikely that the amplitudes and 
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-X - data-independent [7] 
— I — data-dependent (greedy) 
-EE— data-dependent (optimum) 




Fig. 3 6/K versus K/N for B = 16. 
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Fig. 4 &/K versus BforK- 32, N = 16, and K = 16. 



phases of the interference hkU<K(i-\),t in (fTTT i are close to 
those of the data symbols for all time slots. 

5.2 BER 

The bit error rate (BER) of the data-dependent scheme is 
compared to that of the data-independent scheme. It is 
preferable to use error-correcting codes satisfying the fol- 
lowing three conditions: 

1 . SISO decoding can be performed efficiently. 

2. High performance can be achieved in the low-rate 
regime. 

3. Robustness for erasures can be provided. 

Graph-based codes, such as turbo codes lfT31l and LDPC 
codes, satisfy the first condition. However, it is not straight- 
forward to construct LDPC codes satisfying the second con- 
dition 1 26 1 (See also 04)). For systematic codes such as 
turbo codes, the performance degrades significantly when 
the erasure of systematic bits occurs. Thus, a non-systematic 
code is a reasonable option for satisfying the last condition. 
In this paper, we use a repeat-accumulate (RA) code M27II28I 
with rate r. The RA code is a serial concatenation of a 



J i L J L ! 


i j 















Iff 4 - 



data-dependent (lower bound) 

— I — data-dependent (soft decision) 
-X — data-dependent (hard decision) 

data-independent [7] (lower bound) ; 

-|- - data-independent [7] (soft decision) : 
-X - data-independent [7] (hard decision) : 
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Transmit SNR per information bit in dB 



Fig. 5 BER after 40 iterations. K - 32, K 
B = 16, T c = 16, r = 1/4, and L = 4000. 



16,7V = 16, 



repetition code with rate r and an accumulator. In BICM, 
QPSK is used in conjunction with random uniform inter- 
leaving whose length is equal to the code length L/r. 

Figure [5]presents the BERs for TV = B = T c = 16. The 
BERs of the data-dependent scheme for the soft-decision de- 
modulator and the hard-decision demodulator are denoted 
by {+} and {x} connected with solid lines, respectively. The 
messages are updated in the order "demodulator — > decoder 
for the inner code — > decoder for the outer code — > de- 
coder for the inner code — > demodulator — > ■ ■ • ". The BER 
of genie-aided iterative decoding for the data-dependent 
scheme, in which a genie informs the receiver about ©, 
is also shown by a solid line. Dashed lines are used, in- 
stead of solid lines, to represent the corresponding BERs for 
the data-independent schemes. The overall sum rate of all 
systems is equal to 2rK = 16 bps/Hz. The transmit SNR 
per information bit is defined as l/{2rKNo) = l/(16/Vo). 
The data-dependent scheme can provide a performance gain 
of 0.35 dB at a BER level of 10~ 4 , compared to the data- 
independent scheme. The BERs of the soft-decision de- 
modulator for both schemes are close to the correspond- 
ing genie-aided lower bounds. This implies that the soft- 
decision demodulator can detect successfully whether user k 
has been selected, i.e., ©. The gaps between the soft- 
decision demodulator and the hard-decision demodulator 
correspond to the performance loss due to the hard decision 
of (|6). The soft decision of (0 can achieve slightly smaller 
BER than that for the hard decision. 

Figure [6] shows the BERs for N — 16 and T c = 32. 
The block size of US for the data-dependent scheme is 
set to B = 163, while the block size of US for the data- 

1 When one makes a comparison between the proposed 
schemes in Figs. [5] and [6] the comparison may be regarded as a 
comparison between users with different coherence times: Under 
the assumption that the coherence times are a multiple of T c = 16, 
the proposed schemes in Fig. [5] show the performance for users 
with coherence time T c , while those in Fig. [6] correspond to the 
case of users with coherence time 2T C . 
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Fig. 6 BER after 40 iterations. K = 32, K = 16, N 
B = 16, T c = 32, r = 1/4, and L = 4000. 
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Fig. 8 Achievable sum rare © versus K. K — 32, N = 
16, andfi = 16. 
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Fig. 7 BER after 40 iterations. K = 32, K = 8, N = 16, 
B = 16, T c = 16, T e = 16, r = 1 /8, and L = 4000. 



independent scheme is equal to the coherence time T c = 
32. Thus, the frequency of US for the data-dependent 
scheme is twice the frequency for the data-independent 
scheme. Interestingly, the diversity order (BER slope) for 
the data-dependent scheme is different from that for the 
data-independent scheme. This is because the block sizes 
of US for the two schemes are different from each other. 
The diversity order is determined by typical error events in 
the high SNR regime 1 9 ] . The data symbols for non-selected 
users are erased during one block of US, i.e., during B and 
T c time slots for the data-dependent and data-independent 
schemes, respectively. The occurrence number of the era- 
sure states around the mean fluctuates strongly as the block 
size of US increases. Decoding typically fails in the high 
SNR regime when the occurrence number of the erasure 
states deviates to a large value. As a result, the diversity 
order for the data-independent scheme is smaller than that 
for the data-dependent scheme. 

Figure|7]shows the BERs for K = 8 and K = 32. Since 



the RA code with r — 1 / 8 is used, the overall sum rate of all 
systems is given by 8 bps/Hz. The gaps between the genie- 
aided lower bounds and the BERs for the soft-decision de- 
modulator are larger than those for K = 16, shown in Fig. [5] 
This observation is understood as follows: The code rate r 
should be reduced as the ratio K/K decreases. Reducing 
the rate r results in decreasing a level of the receive SNR 
required for SISO decoding. Consequently, the demodula- 
tor is forced to detect (0 for lower receive SNRs. This is the 
reason for the increase of the gaps between the lower bounds 
and the BERs for the soft-decision demodulator. Further- 
more, we find that the gaps between the BERs for the two 
demodulators are also larger than those in Fig. [5] This result 
implies that the soft decision of (0 is an effective method 
for improving the performance for low SNRs. 

5.3 Achievable Sum Rate 

We have so far investigated the performance of the data- 
dependent scheme for fixed K and B. How to choose K 
and B is discussed in this section. We first focus on B. 
One should choose B > N in terms of the computational 
complexity, since the complexity per time slot is given by 
0{KKmax(l,N/B)N}, as shown in Section [33] The block 
size B should be decreased in terms of the energy penalty, 
as shown in Fig. |H while B should be increased in terms 
of the accurate detection of ([6j. One reasonable option is 
to choose the smallest B that achieves an accuracy require- 
ment for the detection of ©, determined by the used error- 
correcting codes. 

We next discuss how to choose K. We can assume that 
(0 is known to the receiver, when B is appropriately de- 
signed. Then, one should choose K to maximize the achiev- 
able sum rate ((9]). Figure [8]plots the achievable sum rate (0 
as a function of K/N. The achievable sum rates for the 
optimal data-dependent scheme and the data-independent 
scheme are also shown in the same figure. There are the 
optimal number of selected users for all SNRs 1/Nq. 
The optimal number A^pt increases as SNR grows. These 
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observations are consistent with the following information- 
theoretical intuitions: The whole power should be concen- 
trated on sending messages for one user in the low SNR 
regime, while messages for multiple users should be sent si- 
multaneously in the high SNR regime. The optimal number 
of selected users ^T opt may be estimated in practice on the ba- 
sis of feedback information about the receive SNR 1/(SNq), 
provided from each user. 

6. Conclusions 

We have proposed a greedy algorithm of data-dependent 
US with ZFBF for fast fading Gaussian VBCs with perfect 
CSI at the transmitter. For the equal power case, the pro- 
posed US algorithm can outperform data-independent US 
in terms of the energy efficiency, BER, and the achievable 
sum rate, without increasing the complexity for the transmit- 
ter in terms of the order for fast fading channels. We have 
proposed iterative detection and decoding schemes based on 
BR The schemes allow each user to detect whether he/she 
has been selected, without overhead for training, even for 
small block size of US. Furthermore, how to choose two 
design parameters has been discussed on the basis of the 
achievable sum rate. We conclude that data-dependent US 
is an efficient method of achieving a good tradeoff between 
the performance and the complexity for fast fading VBCs. 

Appendix A: Derivation of (0 

Let us derive the lower bound on the achievable rate 
^°pt) Q f user ^ unc [ er the assumption that (0 and (0 are 
known to the receiver. We know that the achievable rate 
^ opt) as T — > oo is equal to the mutual information per time 
slot between all data symbols and all variables known to the 
receiver 1 9 1 



< pt) = lim h{{x k MyKtU4\S), 

K T->oo T 



(A-l) 



which ykj is given by 01. Using the chain rule for mutual 
information 1 12 1 yields 



(opt) 



= Hm i {/({x,,}; \af),S) + 7(fe, t }; fe, t }|{af },£)} 

(A- 2) 



1 



> lim -I({xu); {y k Maf),&) (= fit) 



In the derivation of the lower bound (1A- 2\ . we have used the 
non-negativity of mutual information. Since we are consid- 
ering the assumption of fast fading, 0) is expected to con- 
verge in probability to a deterministic value S as T — > oo. 
Thus, (1A- 2b reduces to 



1 



R k = lim -I({x Kt };{y Kt \\{af },6 = 6). 



Let J s = {; e {0,...,T/B- 1) 



a 



0> 



(A- 3) 



= 1} denote 



the set of the indices of blocks in which user k has been 



selected. We consider a suboptimal receiver that uses only 
the received signals in the blocks j e J~ s to obtain a lower 
bound 



^>lim ™ C (1). 
r-wo t \SNo) 



with 



C 



(saJ 



i{xk,t, ykMk 



1,6 = 6), 



(A 4) 



(A- 5) 



which is given via the equivalent channel (0. In ( 1A- 41 ). the 
expectation of |Js| is defined as 



E[L7s|]= J] IJsl 
j- s c{o,...,r/B-i) 



x Prob({a 



(j) 



0:j*J t }). (A- 6) 



The coefficient E[\,J S \]/(T/B) in (1A-4I) is equal to the aver- 
age frequency at which user k is selected, and tends to K/K 
in the limit T — > oo, because of the assumption of fast fad- 
ing. This implies that the lower bound dA- 4t reduces to 0. 

Appendix B: Derivation of Data-Dependent US Algo- 
rithm 

We focus on the first block of US, and drop the superscript <j) 
in dTTb . The proposed US algorithm selects the user k to 
minimize ( fTTT ) with 7C = < K{i) in stage i. We first prove that 
E<K(j)(B) is given by the recursive formula ( fTTI i. Step 1 in 
the proposed algorithm implies that the statement holds for 
i = 1. Thus, we assume i > 1. Let us define H<x(i) e 



'ixN 



as 



H 



9C(0 - 



H-K(i-1) 



(A- 7) 



for k e {1, . . . , K)\K(i- 1). Substituting u Km = H^x^m 
into QJ) with 7C = K(i) yields 

1 B 

E-K(i)(B) = — ^ x ^(i)/ H K(i)HK {j) Y l x-K(i),t- (A- 8) 
t=\ 

Using the inversion formula for block matrices, 



A 


B 


-l 


A' 1 


+ A- 


l BE 'r l CA~ l 


-A [ BE 1 


C 


D 






-E 


l CA- 1 


E- 1 



(A- 9) 



with E = D- CA l B, we obtain 



(if9C(i)ff Wi) ) 



h k H[ 



(A- 10) 



with 



h k Pf x h k 
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(A- 11) 

In ( IA- 101 ) and ( 1A- 1U . the Hermitian matrix Pf^ is the pro- 
jection matrix ([15} from C lxN onto the orthogonal com- 
plement of the subspace spanned by the channel vectors 
{hk : k € ( K(i - 1)} selected in the preceding stages. Substi- 
tuting the expression ( 1A- 101 ) into ( 1A- 81 ) and using ( fTTT ) for 
7C = TCii - 1), we arrive at the recursive formula ( TTTb . 

Next, we derive the recursive formula ( [ToT l for H} K( ^.y 
Expression ( TToT i can be derived in the same manner as in 
the derivation of ( TTTb : Substituting ( I A- 10b into Hjj^ = 

H^^iH-jcffjH^,^)' 1 , we immediately obtain the recursive 
formula (fTSb . 
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